Speech cohesion for topic segmentation of spoken contents

نویسندگان

  • Abdessalam Bouchekif
  • Géraldine Damnati
  • Delphine Charlet
چکیده

In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribution of terms and speakers is different enough from one side of the boundary to the other. Beyond speaker distribution, we also propose to take into account speaker identification and to confront speaker identities to identities mentioned in the spoken content in order to reinforce cohesion of a topic segment. Experiments run on three corpora of various Broadcasts News formats collected from 9 French TV channels, show a significant improvement in the overall topic segmentation process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion ...

متن کامل

Variation in Language and Cohesion across Written and Spoken Registers

This paper investigates the variation in cohesion across written and spoken registers. The same method and corpora were used as in Biber’s (1988) study on linguistic variation across speech and writing; however instead of focusing on 67 linguistic features that primarily operate at the word level, we compared 236 language and cohesion features at the textlevel. Variations in frequencies across ...

متن کامل

Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities

Story segmentation plays a critical role in spoken document processing. Spoken documents often come in a continuous audio stream without explicit boundaries related to stories or topics. It is important to be able to automatically segment these audio streams into coherent units. This work is an initial attempt to make use of informative lexical terms (or key terms) in recognition transcripts of...

متن کامل

Minimum Cut Model for Spoken Lecture Segmentation

We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global analysis improves the segmentation accuracy and is robust in the presence of speech recognition er...

متن کامل

The Segmentation of Speech

This paper reports a phenomenon supporting the hypothesis that the emergence of structure in the evolution of language was a staged process. To develop a grammatical structure it seems necessary to first have discrete constituents which can be the building blocks of a hierarchical system. By analysing observed speech we show that the development of a linear sequence of grammatical constituents ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014